Part 2: Processing data in GRASS
In this notebook we’ll go through the processing of MODIS LST daily time series data to derive relevant predictor variables for modeling the distribution of Aedes albopictus in Northern Italy. Furthermore, we’ll show how to obtain and process occurrence data and background points.
Let’s first go through some temporal concepts within GRASS GIS…
The TGRASS framework
GRASS GIS was the first FOSS GIS that incorporated capabilities to manage, analyze, process and visualize spatio-temporal data, as well as the temporal relationships among time series.
- TGRASS is fully based on metadata and does not duplicate any dataset
- Snapshot approach, i.e., adds time stamps to maps
- A collection of time stamped maps (snapshots) of the same variable are called space-time datasets or STDS
- Maps in a STDS can have different spatial and temporal extents
- Space-time datasets can be composed of raster, raster 3D or vector maps, and so we call them:
- Space time raster datasets (STRDS)
- Space time 3D raster datasets (STR3DS)
- Space time vector datasets (STVDS)
Temporal modules
GRASS temporal modules are named and organized following GRASS core naming scheme. In this way, we have:
- t.*: General modules to handle STDS of all types
- t.rast.*: Modules that deal with STRDS
- t.rast3d.*: Modules that deal with STR3DS
- t.vect.*: Modules that deal with STVDS
Other TGRASS notions
- Time can be defined as intervals (start and end time) or instances (only start time)
- Time can be absolute (e.g., 2017-04-06 22:39:49) or relative (e.g., 4 years, 90 days)
- Granularity is the greatest common divisor of the temporal extents (and possible gaps) of all maps in the space-time cube
- Topology refers to temporal relations between time intervals in a STDS.
TGRASS framework and workflow
Hands-on, let’s start
So let’s start…
# data directory
homedir = os.path.join(os.path.expanduser('~'), "grass_ncsu_2023")
# GRASS GIS database variables
grassbin = "grass"
grassdata = os.path.join(homedir, "grassdata")
location = "nc_spm_08_grass7"
mapset = "PERMANENT"
# create directories if not already existing
os.makedirs(grassdata, exist_ok=True)Importing species records
You can also get the occurrences directly from GBIF into GRASS by means of v.in.pygbif.
Creating random background points
# Create buffer around Aedes albopictus records
v.buffer input=aedes_albopictus output=aedes_buffer distance=2000
# Set computational region
g.region -p raster=lst_2014.001_avg
# Create a vector mask to limit background points
r.mapcalc expression="MASK = if(lst_2014.001_avg, 1, null())"
r.to.vect input=MASK output=vect_mask type=area
# Subtract buffers from vector mask
v.overlay ainput=vect_mask binput=aedes_buffer operator=xor output=mask_bg
# Generate random background points
v.random output=background_points npoints=1000 restrict=mask_bg seed=3749See extra slides for details about computational region and masks in GRASS GIS.
Create daily LST STRDS
# Create time series
t.create type=strds temporaltype=absolute \
output=lst_daily title="Average Daily LST" \
description="Average daily LST in degree C - 2014-2018"
# Get list of maps
g.list type=raster pattern="lst_201*" output=list_lst.csv
# Register maps in strds
t.register -i input=lst_daily file=list_lst.csv \
increment="1 days" start="2014-01-01"
# Get info about the strds
t.info input=lst_dailySee t.create and t.register
Generate environmental variables from LST STRDS
Long term monthly avg, min and max LST
for i in $(seq -w 1 12) ; do
t.rast.series input=lst_daily method=average \
where="strftime('%m', start_time)='${i}'" \
output=lst_average_${i}
t.rast.series input=lst_daily method=minimum \
where="strftime('%m', start_time)='${i}'" \
output=lst_minimum_${i}
t.rast.series input=lst_daily method=maximum \
where="strftime('%m', start_time)='${i}'" \
output=lst_maximum_${i}
doneSee t.rast.series manual for further details.
Bioclimatic variables
# Install extension
g.extension extension=r.bioclim
# Estimate temperature related bioclimatic variables
r.bioclim \
tmin=$(g.list type=raster pattern="lst_minimum_??" separator=",") \
tmax=$(g.list type=raster pattern="lst_maximum_??" separator=",") \
tavg=$(g.list type=raster pattern="lst_average_??" separator=",") \
output=worldclim_
# List output maps
g.list type=raster pattern="worldclim*"See r.bioclim manual for further details.
Spring warming
# Annual spring warming: slope(daily Tmean february-march-april)
t.rast.aggregate input=lst_daily output=annual_spring_warming \
basename=spring_warming suffix=gran \
method=slope granularity="1 years" \
where="strftime('%m',start_time)='02' or \
strftime('%m',start_time)='03' or \
strftime('%m', start_time)='04'"
# Average spring warming
t.rast.series input=annual_spring_warming \
output=avg_spring_warming \
method=averageSee t.rast.aggregate manual.
Autumnal cooling
# Annual autumnal cooling: slope(daily Tmean august-september-october)
t.rast.aggregate input=lst_daily output=annual_autumnal_cooling \
basename=autumnal_cooling suffix=gran \
method=slope granularity="1 years" \
where="strftime('%m',start_time)='08' or \
strftime('%m',start_time)='09' or \
strftime('%m', start_time)='10'"
# Average autumnal cooling
t.rast.series input=annual_autumnal_cooling \
output=avg_autumnal_cooling \
method=averageNumber of days with LSTmean >= 20 and <= 30
# Keep only pixels meeting the condition
t.rast.algebra -n \
expression="tmean_higher20_lower30 = if(lst_daily >= 20.0 && lst_daily <= 30.0, 1, null())" \
basename=tmean_higher20_lower30 suffix=gran nproc=7
# Count how many times per year the condition is met
t.rast.aggregate input=tmean_higher20_lower30 \
output=count_tmean_higher20_lower30 \
basename=tmean_higher20_lower30 suffix=gran \
method=count granularity="1 years"
# Average number of days with LSTmean >= 20 and <= 30
t.rast.series input=count_tmean_higher20_lower30 \
output=avg_count_tmean_higher20_lower30 method=averageSee t.rast.algebra manual for further details.
Number of consecutive days with LSTmean <= -2.0
# Create annual mask
t.rast.aggregate input=lst_daily output=annual_mask \
basename=annual_mask suffix=gran \
granularity="1 year" method=count
# Replace values by zero
t.rast.mapcalc input=annual_mask output=annual_mask_0 \
expression="if(annual_mask, 0)" \
basename=annual_mask_0
# Calculate consecutive days with LST <= -2.0
t.rast.algebra \
expression="lower_m2_consec_days = annual_mask_0 {+,contains,l} \
if(lst_daily <= -2.0 && lst_daily[-1] <= -2.0 || \
lst_daily[1] <= -2.0 && lst_daily <= -2.0, 1, 0)" \
basename=lower_m2_ suffix=gran nproc=7# Inspect values
t.rast.list input=lower_m2_consec_days \
columns=name,start_time,end_time,min,max
# Median number of consecutive days with LST <= -2
t.rast.series input=lower_m2_consec_days \
output=median_lower_m2_consec_days method=medianWe have all these maps within GRASS, how do we connect with R now? Let’s move to #part2